Mining Maximal Frequent Subtrees based on Fusion Compression and FP-tree

نویسندگان

Xiaoke Wu

Xiongfei Li

چکیده

It is commonly accepted that mining frequent subtrees play pivotal roles in areas like Web log analysis, XML document analysis, semi-structured data analysis, as well as biometric information analysis, chemical compound structure analysis, etc. An improved algorithm, i.e. MFPTM algorithm, which based on fusion compression and FP-tree principle, was proposed in this paper to determine a better way to mine maximal frequent subtrees. The algorithm firstly retains subtrees which only contain frequent nodes by fusion compression, then according to FP-tree principle mines frequent subtrees. In the process of mining frequent subtrees, MFPTM algorithm is the means by which we attempt to satisfy our appetite for saving searching space of mining candidate patterns, and our craving to solve problems of frequent pattern mining based on Apriori algorithm which is generating a large quantity of candidate patterns. MFPTM algorithm, which actively represents as many viewpoints as is both possible and feasible as an advanced algorithm, improves the efficiency of mining frequent subtrees.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CMTreeMiner: Mining Both Closed and Maximal Frequent Subtrees

Tree structures are used extensively in domains such as computational biology, pattern recognition, XML databases, computer networks, and so on. One important problem in mining databases of trees is to find frequently occurring subtrees. However, because of the combinatorial explosion, the number of frequent subtrees usually grows exponentially with the size of the subtrees. In this paper, we p...

متن کامل

Fast Extraction of Maximal Frequent Subtrees Using Bits Representation

With the continuous growth in XML data sources over the Internet, the discovery of useful information from a collection of XML documents is currently one of the main research areas occupying the data mining community. The most commonly adopted approach to this task is to extract frequently occurring subtree patterns from XML trees. But, the number of frequent subtrees usually grows exponentiall...

متن کامل

Efficient Data Mining for Maximal Frequent Subtrees

A new type of tree mining is defined in this paper, which uncovers maximal frequent induced subtrees from a database of unordered labeled trees. A novel algorithm, PathJoin, is proposed. The algorithm uses a compact data structure, FST-Forest, which compresses the trees and still keeps the original tree structure. PathJoin generates candidate subtrees by joining the frequent paths in FST-Forest...

متن کامل

Smart frequent itemsets mining algorithm based on FP-tree and DIFFset data structures

Association rule data mining is an important technique for finding important relationships in large datasets. Several frequent itemsets mining techniques have been proposed using a prefix-tree structure, FP-tree, a compressed data structure for database representation. The DIFFset data structure has also been shown to significantly reduce the run time and memory utilization of some data mining ...

متن کامل

Discovering Frequent Embedded Subtree Patterns from Large Databases of Unordered Labeled Trees

Recent years have witnessed a surge of research interest in knowledge discovery from data domains with complex structures, such as trees and graphs. In this paper, we address the problem of mining maximal frequent embedded subtrees which is motivated by such important applications as mining “hot” spots of Web sites from Web usage logs and discovering significant “deep” structures from tree-like...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Mining Maximal Frequent Subtrees based on Fusion Compression and FP-tree

نویسندگان

چکیده

منابع مشابه

CMTreeMiner: Mining Both Closed and Maximal Frequent Subtrees

Fast Extraction of Maximal Frequent Subtrees Using Bits Representation

Efficient Data Mining for Maximal Frequent Subtrees

Smart frequent itemsets mining algorithm based on FP-tree and DIFFset data structures

Discovering Frequent Embedded Subtree Patterns from Large Databases of Unordered Labeled Trees

عنوان ژورنال:

اشتراک گذاری